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Abstract — Super-resolution (SR) techniques make use of sub- 
pixel shifts between frames in an image sequence to yield higher- 
resolution images. We propose an original observation model 
devoted to the case of non isometric inter-frame motion as 
required, for instance, in the context of airborne imaging sensors. 
First, we describe how the main observation models used in 
the SR literature deal with motion, and we explain why they 
are not suited for non isometric motion. Then, we propose an 
extension of the observation model by Elad and Feuer adapted to 
affine motion. This model is based on a decomposition of affine 
transforms into successive shear transforms, each one efficiently 
implemented by row-by-row or column-by-column 1-D affine 
transforms. 

We demonstrate on synthetic and real sequences that our 
observation model incorporated in a SR reconstruction technique 
leads to better results in the case of variable scale motions and 
it provides equivalent results in the case of isometric motions. 

Index Terms — Super- Resolution, affine motion, multi-pass in- 
terpolation, bspline, L2 approximation, projection, inverse prob- 
lems, convex regularization. 



I. Introduction 

SUPER-RESOLUTION (SR) techniques aim at estimating 
a high-resolution image with reduced ahasing, from a 
sequence of low-resolution (LR) frames. The literature on the 
subject is abundant, see [1-6] and [7] for a recent review. 

Our contribution deals with the class of "Reconstruction 
Based" SR techniques [8], which can be split in three steps: (1) 
estimation of inter- frame motion; (2) computation of a linear 
observation model including motion; (3) regularized inversion 
of the linear system. 

We are interested in aerial imaging applications which often 
imply non isometric motion, as in the case of an airborne 
imager getting closer to the observed scene, see Sec. VI- 
C. Such non isometric motion fields can be estimated using 
various registration algorithms [9, 10]. Hence, step (1) is not 
the main issue in this context. Concerning step (2), the SR 
literature is rather allusive: most published methods implicitly 
assume translational motion [1,4,6,8, 11-19]. To the best of 
our knowledge, if some former contributions apply to affine 
[20,21] or even homographic [9,22] warps none of them 
explicitly deals with variable distance from scene to imager 
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in step (2)^. So, we focus on the construction of a proper 
observation model for affine motions with consistent scale 
changes. 

Section II proposes a bibliographical survey of the SR 
literature, with respect to the observation model. It is shown 
that published methods are not adapted to the considered 
context: its main difficulty is to account for non translational 
motion in a tractable discrete model. 

Section III is devoted to the proposed new observation 
model that extends the popular one introduced by Elad and 
Feuer [5] by replacing traditional pointwise interpolation by 
techniques based on L2 approximations and shifted bspline 
basis [23]. We show that our model leads to a more precise 
prediction of LR frame pixel values, in the case of a combined 
zoom and rotation motion. 

Further comparisons are performed on SR reconstruction 
results. Section IV briefly introduces the convex regularization 
framework that we use for SR reconstruction. Such techniques 
are customary in various inverse problems, including restau- 
ration and SR [2,5,24]. 

We use the resulting SR reconstruction technique to com- 
pare various observation models on synthetic (section V) 
and real (section VI) datasets. These experiments consistently 
show that our model is more accurate and reliable for se- 
quences combining rotation and important scale changes, at 
the expense of a moderate increase of computational load. 

II. Analysis of previous works 

This section describes several published observation models 
differing by the way they account for motion through numer- 
ical approximations. 

A. Notations 

Uppercase letters (resp. boldface letters) refer to matrices 
(resp. vectors), n = [n,/]* G and i = [i^jf ^ denote 
discrete positions of LR and SR pixels and n = [li, v]* G 
denotes real positions on the image plane. An image x can 
be described by a continuous field x{u), or by a sequence 
of discrete coefficients x [i] and as lexicographycally ordered 
vector X. 

B. General observation model 

Let x{.) be the input irradiance field and y[.] be the 
observed LR image. is a sampled version of the convolution 
of X with an optical point spread function (PSF) ho integrated 

^It is addressed formally in [3] but not implemented nor demonstrated. 
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by a box function / corresponding to the collecting surface of 
the detector: 



y 



[n] — / {ho * x) (nA — v) I {v) dv , 



with n G 5a- 5a C is the set of discrete detectors 
positions on a grid with step A. Let us denote N = Card (Ga) 
the number of LR pixels in frame y. 

It is customary to define a joint optics-plus-detector PSF 

= /lo * / SO that y[n] = x (nA). 

SR methods rely on the usual "brightness constancy" as- 
sumption which is the basis of many motion estimation tech- 
niques, in particular intensity-based techniques [10]. In this 
framework, SR methods assume that temporally neighboring 
frames originate from a unique input x up to a warp modeling 
relative sensor /scene motion. 

Let yk (k = 1, . . . , iC) denote a neighboring frame of y, 
then (/) yk derives from an irradiance field Xk through sensor 
h\ yk [n] = * Xk (nA) and (//) there is a warp Wk, such that 
Xk {u) = X {wk {u)). Combination of both equalities yields 



yk [n]=h^ {xowk) (nA) . 



(1) 



The next step is discretization of x for the sake of numerical 
computations. The irradiance field x is decomposed on a 
shifted kernel basis: 



X (u) 



E 



x[i] (fi{u — iA') 



(2) 



5a' is the SR grid, with step A' and M = Card (5aO is 
the number of SR pixels. The ratio L = A/ A' defines the 
practical magnification factor (PMF) of the SR process: it is 
usually greater than two. Note that it does not imply that the 
actual resolution improvement is as high as the PMF. 

may be any classical interpolation kernel (box function, 
bilinear, ...). In the sequel, we use bspline basis, which encom- 
pass most classical interpolation schemes [25-27] . Then (/:) is a 
separable bspline kernel of order m: (/? {u) = (3'^ (u) (3^ (v), 
where (3^(u) is the (m+l)-fold convolution of a box function. 
Let us rewrite (1) as: 



yk[n]= I x{wk{v))h{nA- v) dv . (3) 
Injecting (2) yields: 



Vk [n] = ^ ak[n,i]x[i] 



dk 



[n,i]= / (p{wk{v) - iA') h{nA- v)dv . (4) 

Using lexicographically ordered vector representation of im- 
ages, a matrix formulation writes: 

yk = AkX . 

The whole matrix A = [Ai . . . A^]* is huge with dimensions 
KN X M, M ^ NL'^. For instance, a sequence of i^T = 
10 frames, with dimensions N = 128^ and a PMF L = 2 
leads to about 43 billion elements. Of course, A^ is a sparse 
matrix with a band structure, as practical PSF h spreads over 
two or three LR pixels at most and (/? is a separable bspline 



kernel, whose support is (m + 1)A' wide. However, the cost 
of computing all non zero elements of A remains formidable 
for general warps Wk- 

In the following, we review landmark SR papers with 
respect to the way they compute A. We discuss three main 
approaches: 

1) Exact computation for special cases of Wk^h and (p 

2) Convolve -then-Warp approximation 

3) Warp-then- Convolve approximation 

C Exact computation 

Exact computation is tractable only in two special cases: 

• motion is a global translation; 

• (f and h are both box functions and motion is affine. 

1) Global translation: When Wk is a global translation, 
(1) leads to a simple convolution. Indeed, replacing Wk{u) = 
u — Tk inside (4) yields: 

a/e[n, i] = cp ^ h (nA — iA' — Tk) , 

and the observation equation writes: 

yk[n] = ^(p^h inLA' - iA' - Tk) x[i]= gk^x [nL] 

i 

with gk{u) = ((/9 * h){uA' — Tk)- For a given integer L, each 
LR frame appears as a subsampled version of the discrete 
convolution of x with kernel gk- 

Most of the early SR literature is devoted to this global 
translation case. It naturally leads to either Fourier tech- 
niques [1,11,12] or equivalent multi-channel filtering tech- 
niques [13] based on the generalized Papoulis theorem [28]. 

2) (p and h are box funtions: When if and h are box 
functions [2,3,29], (4) is the common area between each 
detector and each warped SR pixel (see Fig. 1). 




Fig. 1. Lp and h are assumed box functions and motion is a rotation. 
The fine grid represents the grid of SR pixels, while the coarse one 
is the grid of detectors. Common areas between the middle detector 
and each SR pixel are colored. 

Such an observation model has been proposed by Stark and 
Oskoui for rotational warps [29]. No indication is provided in 
their paper about the numerical computation of the relevant 
intersections. 

Assuming affine motion, each warped SR pixel is a convex 
polygon, and computation of the intersection of two convex 
polygons can be performed by a "clipping" algorithm such 
as [30]. However, this technique is not suitable for SR purpose 
due to its high computational burden. 
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D. Convolve -then-Warp 

Let us Start back from (3). In practice, h scarcely spreads 
over two or three LR pixels, thus integral (3) extends on a 
neighborhood V (nA) around nA. Let us assume that Wk {u) 
can be locally approximated by a translation: 

Wkiu) ~ Wk{nA) -\- u — nA , n G V (nA) . 

Then (3) can be approximated by a convolution: 

yk [n] ^ {h^ x){wk{nA)) . (5) 

Such an approximation is depicted in Fig. 2. The center of 
each detector is well positioned, but the integration area is a 
rough approximation. Such an approximation leads to errors 
in the integration step for large rotations and scale variations. 




(a) Correct detector integration area. (b) Local translation approximation of 

warp and resulting detector area. 

Fig. 2. llustration of the Convolve-then- Warp approximate model (5): white 
regions are not accounted for, gray ones are integrated once while black 
regions are incorrectly integrated in two detectors output. 

The discretization of this model is much easier than the 
general model (1), because it is an irregular sampling of a 
convolution. The simple model of Schultz and Stevenson [2] 
is a special case of this approach when h and (p are both 
box functions and the detector center positions are rounded 
to integer multiples of A^ Then, the components a/e[n,i] are 
binary, with a/c[n, i] = 1 if the i-th SR pixel is inside the n-th 
detector area, approximated as in Fig. 2(b). A refined version 
of this model is used in [31]. 

As a conclusion, this model appears computationnaly at- 
tractive but is clearly unable to correctly account for non- 
translational warps because of the fixed detector geometry (see 
Fig. 2). 

E. Warp -then- Convolve 

This approach consists in using the convolution relation- 
ship (1) between the data and the warped SR image 
Xk{u) = x{wk{u)). If a discretized version Xk of Xk over 
the A' -shifted basis functions (p is available, (1) can easily be 
discretized as: 

= BUxk 

where D is a down-sampling matrix, and H is the convolution 
matrix associated to the optical-plus-detector response. 



Now the main problem is to construct Xk using the dis- 
cretized SR image coefficients x[.] defined by (2). A first 
approach may be to enforce equality on the grid nodes: 

^ X, [i\ ^ {{I - i) AO = ^ X [j] ^ {w, (/AO - jAO . 

If (/p is a bspline of order m = or m = 1, it satisfies 
(p ((/ — i) A) = 6 {I — i), and we get: 

Xk [l] = ^ X [j] {wk {I A) - jA) . (6) 

In Other words, the discrete coefficient Xk [I] is the interpo- 
lation of X at point {I A). If (p is a. box function (m = 0), 
(6) reduces to nearest neighbor interpolation and if is a 
triangle function (m = 1), (6) is a bilinear interpolation. 

Interpolation (6) leads to the definition of a warping matrix 
Wfc, which summarizes all motion information. The complete 
image formation model is then: 

2/, = DHWfeX. (7) 

This is exactly the formulation proposed by Elad and Feuer [5, 
21] referred to as "E&F" model in the following. 

Fig. 3 summarizes this method: starting from the sought 
SR image Fig. 3(a), an intermediate high-resolution image 
Fig. 3(b) is constructed with a pixel grid aligned with the 
detector grid using either bilinear or nearest neighbor interpo- 
lation. Integration and subsampling are then straightforward. 




Fig. 3. Illustration of the E&F model: starting from SR image 
Fig. 3(a) an intermediate high-resolution image Fig. 3(b) is con- 
structed with a pixel grid aligned with the yk data detector grid using 
either bilinear or nearest neighbor interpolation. 

Compared to the previous approach, the E&F model seems 
much more precise for rotation warps. However, one can 
foresee aliasing problems in the case of scale changes due 
to the pointwise interpolation step (6). 

III. Proposed observation model 

This Section introduces an original observation model ex- 
tending the E&F model, by replacing pointwise interpola- 
tion (6) by a technique based on L2 function approximation. 

Dealing with variable scale using L2 approximation tech- 
nique is not easy in 2-D. In this context, Catmull and 
Smith [32] introduced an efficient decomposition of 2-D affine 
transforms into separable 1-D transforms. 

First, we will introduce such decomposition into our obser- 
vation model. Next, we focus on the 1-D operations in order to 
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achieve a L2 approximation on a bspline basis. Finally, we will 
compare observation models and point out the improvements 
provided by the proposed model. 

A. Warping decomposition 

Thevenaz and Unser showed that 2-D invertible affine 
transforms can be handled by two- shear or three- shear decom- 
positions [23]. Each shear is a vertical or horizontal coordinate 
transform such as: 



Su{u) 



a2 


1 

01 



P2 
1 





(8) 



(9) 



Both ot them are one-dimensional affine transforms separably 
applied row-by-row or column-by-column. As an example, 
Fig. 4 provides the intermediate images resulting of each shear 
of the following affine motion and decomposition: 



1 

-1/4 



1/4 A 
7/16; 



1 

-1/4 




1/2 



1/4 



(10) 




Tiginal image. 



(c) Vertical shear. 



Fig. 4. Example: the affine transform of (10) is decomposed in two 
steps. Each step is a shear along one coordinate image axe. 

This decomposition is not unique, and the choice of one 
particular decomposition impacts the transformed image qual- 
ity. Catmull and Smith [32] mentioned the bottleneck problem 
resulting from a down-scaling in one pass followed by up- 
scaling in the next pass, resulting in a loss of resolution. 

Many approaches have been proposed to minimize image 
degradation, depending on the considered transform. For in- 
tance, Paeth [33] has proposed a three-shear decomposition 
well-suited for rotation. Other authors refer to TV-pass decom- 
position [34]. 

Multi-pass interpolation techniques and their limitations are 
outside the scope of this article, the reader can refer to [34] 
for deeper insight. In the sequel, we consider only two-shear 
decompositions. In this case, there are two possibilities, and 
one selects the decomposition which reduces the involved 
scale variations [23,35,36]. 



B. 1-D affine transform approximation 

Let us consider a 1-D affine transform with parameters 
(a, r): f{u) f {{u — r)/a). With this notation, a < 1 yields 
a signal reduction and a > 1 yields a signal magnification. 
It is clear that signal reduction may result in important dis- 
cretization errors (as naive subsampling undergoes a frequency 
aliasing). 

In the line of Thevenaz et al [23], let us decompose / on 
the 1-D shifted bspline basis: 

/H= E f[k]P'^{u-k) , (11) 
keGQ 

where Qq (Z TL denotes the set of Q discrete samples (for 
instance the set of pixels of a row of the image). We search 
for coefficients ^ [/c], /c G 5q such that g, defined by 

g{u) = E g\k\r{u-k) , (12) 

achieves the best approximation of f{{u — r)/a) in the L2 
sense, i.e. minimization of / [/ {{u — r)/a) — g{u)]^ du. The 
approximation is the orthogonal projection, and the optimal 
coefficients satisfy the orthogonality equations 



g{u)-f 



/?™ (w - k) 



0, (13) 



for k gQq. Replacing (11) and (12) in (13) yields: 

3 I 

with {u) = p"^ (u/a) I a and ^ = P^^P"^- The so-called 
bi-kernel encodes the geometric transform of a sample to 
a different scale space [36], and actually provides an optimal 
anti-aliasing filter [37]. If a 7^ 1, is not a bspline kernel, 
but remains a piecewise polynomial. A closed form expression 
of is provided in [35]. 

Finally, the sought coefficients g [k] write: 



g[k] = {0' 



« E/wc(^-r-«o I , (14) 

leQQ 



and the inverse filter can be efficiently imple- 

mented through recursive filtering [27]. 

To sum up the process, given a sequence of signal samples 
f {k) and 1-D affine transform parameters (a, r) the approxi- 
mation goes through four steps: 

1) compute bspline coefficients / [k]; 

2) compute the bi-kernel function 

3) compute g [k] with (14) and 

4) post-filter coefficients g [k] to get samples values g {k). 

Remark 1 — The first and the last steps are not required 
when the bspline representation order m is or 1. Indeed, 
for these particular orders, bspline coefficients are identical 
to image samples. 

Remark 2 — In the translation case (a = 1), (,^{u) = 
^2m+ij^^j r/z^ L2 approximation then turns to a mere bspline 
interpolation with a higher-order kernel. 
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C. A two-shear observation model 

In the proposed model, the k-\h observed frame y^^ (in 
vector notation) writes: 

where Sj. and are shear operators. Each operator is an 1- 
D row-by -row (or column-by-column) affine transform, which 
is implemented as described in the previous section. In the 
sequel, we use an order-0 bpline kernel. Thus, as a conse- 
quence of Remark 2, our model is identical to that of Elad 
and Feuer with bilinear interpolation for translation motion. 
The resulting model is denoted TSO for Two-Shear model with 
0-order bspline basis. 



D. Comparing observation models 

In this section we illustrate the quality of each observation 
model compared to exact computation in the special case of h 
and chosen as box functions and affine motion, see Sec. II- 
C. 

We represent the components of the observation matrix 
a/e [n, •] for a unique LR pixel in the form of an image 
patch. This patch displays the weighting coefficients actually 
applied on SR image pixels for computing one LR detector 
output. The first rows of the following three arrays of patches 
show the exact components for rotation angles {0, 15, 30, 45} 
degrees, scale variations of 1 (Fig. 5(a)), 1.2 (Fig. 5(b)) and 
1.6 (Fig. 5(c)) and a PMF of 5. 

The remaining patches show the approximated components 
obtained using Elad and Feuer models with nearest neighbor 
interpolation (E&FO) or with bilinear interpolation (E&Fl) and 
the proposed model (TSO). 

The Convolve-then- Warp model is not presented, but would 
lead to the same image patch made of a fixed size square 
pattern, whatever rotation and zoom factor. 

Fig. 5 shows that E&FO is always incorrect even with 
limited rotations and/or scale variations. It is noticeable that 
in Fig. 5(a), some coefficients value reach two: some SR 
pixels (white colored) contribute twice to the detector. Such 
a behavior has been previously observed for the "Convolve- 
then-Warp" approach, see Fig. 2(b). In the same time several 
SR pixels do not contribute at all to the detector. 

E&Fl provides a better approximation. Still, contributions 
of SR pixels are not uniform inside the detector footprint. This 
is already observed in Fig. 5(a) with rotations, and takes more 
importance in Fig. 5(b) and Fig. 5(c) with scale factor and 
rotations. As E&Fl contributions appear as a smoothed version 
of E&FO ones, one wonders if a bicubic interpolation (E&F3) 
would give correct contributions. This is not the case, as shown 
by Fig. 6. Moreover, as bicubic interpolation does not preserve 
positivity, the E&F3 model exhibits negative contributions. 

Whatever the interpolation method, Elad and Feuer models 
become inaccurate for rotations as low as 15° and zooming 
factor as low as 20%. 

In contrast, the TSO observation model ensures that the 
contributions of SR pixels are uniform inside the detector 
footprint whatever rotation and/or scale factor being applied. 
Remaining differences between exact contributions and TSO 







15 30 

(a) scale factor 1 


45 


□□ 








□ 


n 




□ 






□ 


□ 



(b) scale factor 1.2. 



True ^^^^^ 


]□ 






m 




E&F1 ^^^^ 


ID 




TSO 


ID 





(c) scale factor 1.6. 

Fig. 5. Comparing observation models: SR pixels contributions to 
one detector. Scale factor 1.0 5(a), 1.2 5(b) and 1.6 5(c), rotation 
up to 45 degrees. The models being compared come from the E&F 
methods with order (E&FO) and order 1 (E&Fl) interpolation. The 
last line shows the proposed TSO model, while the first line shows 
the true contributions. 
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15 30 45 dearees 



Fig. 6. Comparing E&Fl model with an Elad and Feuer model with 
bicubic interpolation (E&F3), Scale factor 1.6 and rotation up to 45 
degrees. 

ones are located on the detector boundaries: TSO contributions 
spread on slightly more than true ones. 

IV. Regularization framework 

The inversion step is tackled within a classical convex 
regularization framework [24] as in many other SR methods [2, 
5]. The estimated SR image is the (possibly constrained) 
minimizer of a regularized criterion based on observation 
model and convex edge-preserving penalty: 

Jx{x) = J2\\yk-^r''''xf + xJ2i^4vlx) . (15) 

k cec 

The first term of criterion (15) is a least squares discrepancy 
between data and model output: A^°^®^ stands for the obser- 
vation model which is to be inverted and derives either from 
the Elad and Feuer approach or from the proposed model of 
Sec. III. The second term is a convex penalization term [24]. 
C is the set of cliques: it consists of all subsets of three 
adjacent pixels either horizontal, vertical and diagonal. Vc 
denotes a second-order difference operator within clique c. 
The regularization parameter A balances the trade-off between 
the two terms of the criterion. The potential ips is chosen as 
a L2 — Li hyperbolic function: 

ips (u) = 2s (^^/s^^^-^^? — . 

Parameter s sets the threshold between the quadratic behavior 
(iz <C s), which allows small pixel differences smoothing and 
the linear behavior (u s) aimed at preserving edges. The 
latter part produces a lower penalization of large differences 
compared to a pure quadratic function, has the same 
qualitative behaviour as the Ruber function of [2]. 

Finally, for a given observation model, four solutions are 
computed, based on: 

• quadratic penalty 

• quadratic penalty and positivity constraint 

• hyperbolic penalty 

• hyperbolic penalty and positivity constraint. 

The criterion is convex by construction and has a unique 
global minimizer. The optimization can be achieved by iter- 
ative gradient-like techniques [38] and we resort to a limited 
memory BFGS algorithm^. It belongs to the class of Quasi- 

^The implementation named VMLMB, has been provided by Eric Thiebaut 
(thiebaut @ obs .univ-ly on 1 .fr) . 



Newton algorithms which only requires evaluation of the cri- 
terion and its gradient (no second order derivative is explicitly 
needed) and it is known to have better convergence properties 
than gradient algorithms. 

V. Experiments with synthetic sequences 

This section presents the experiments conducted on syn- 
thetic sequences. Using synthetic sequences has two main 
advantages: 

• Sequences are built from a reference HR image which 
will later be used as a reference to compare with recon- 
structed SR images; 

• We control all imaging parameters: noise level, PSF and 
image size. Motion is exactly known too. 

A. Synthetic data 

To generate a sequence of LR frames, the observation 
matrices are computed exactly according to assumptions 
of Sec. II-C that (p and h are box functions. As previously 
said, such a technique is very time consuming. 

We simulate a smooth motion with a rotation up to 20 
degrees and a zoom factor up to 1.6. Each frame is 128 x 128 
and is built from a256x256 HR reference image. In Fig. 7(a), 
we show the first, middle and last frame generated from the 
reference HR image Lena. 




(b) Mire. 



Fig. 7. We show the first, middle and last frame of sequences 
Lena 7(a) and Mire 7(b). 

We also generate another sequence from a bitonal calibra- 
tion pattern named Mire. The first, middle and last image of 
the sequence are shown in Fig. 7(b). 

B. Results 

Four regularized solutions and three observation models 
(E&FO, E&Fl and TSO) are then available. Hence, we finally 
compare performances of 12 SR settings with respect to the 
reference HR image, by means of the PSNR (Peak Signal- 
to-Noise Ratio, PSNR= 201ogio (255/^6), with e the mean 
square error). For each setting, the presented result is obtained 
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with the best regularization parameter (i.e., selected to get the 
highest reachable PSNR). 

Let us first deal with the "Lena" sequence of Fig. 7(a). 
Fig. 8(a) sums up the performance levels which have been 
achieved. First note that, on these relatively smooth images, 
various regularization settings lead to similar performances, 
and unconstrained quadratic regularization suffices to obtain 
good results. But we observe strong differences between 
observation models. On the average, there is an improvement 
from 4 dB (noisy case) up to 6 dB (no noise) between the 
E&FO and the E&Fl models. Moreover, there is also a gain 
of 1 to 6 dB between the E&Fl model and the TSO model. 




L2 L2* L2L1 

(a) No additional noise. 



L2LV 



36,00 

35,50 

35,00 

_ 34,50 
00 

S 34,00 
a: 

^ 33,50 



35,99 35,^ 



33,00 
32,50 
32,00 
31,50 



35,25 35,25 



34,51 



31,6 



34,51 
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Fig. 8. SR performances on the Lena sequence. Three observation 
models (E&FO (cyan), E&Fl (magenta) and TSO (yellow)) and four 
criteria are compared. Solutions which use a positivity constraint are 
labelled with a star. 

Fig. 10 illustrates the differences between reconstructed SR 
images, using L2 — Li regularization and positivity constraint, 
depending on the chosen observation model. Once again, 
the reconstructed images shown on the first row of Fig. 10 
have been obtained with the best regularization parameters. 
The E&F reconstructions are slightly more blurred than the 
SR image obtained from the proposed TSO model. This is 
confirmed in the lower row which shows image error with 
respect to the reference HR image: the TSO observation model 
yields a better reconstruction on high frequency areas, like the 
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Fig. 9. SR Performances on sequence Mire. Three observation 
models (E&FO (cyan), E&Fl (magenta) and TSO (yellow)) and four 
criteria are compared. Positivity constraint is labelled with a star. 



feather on the hat or the eyes. 

We have also measured CPU time on a Pentium 4 at 
2.66GHz. For this particular sequence, one iteration duration 
is respectively 2.0 and 4.6 seconds, for E&FO and E&Fl 
methods. Our model requires 5.9 seconds per iteration. All 
methods converge roughly with the same number of iterations. 
Hence our method is 30% more time consuming than E&Fl. 

We now consider the bitonal "Mire" sequence shown in 
Fig. 7(b). Results are reported in Fig. 9 in terms of PSNR. As 
expected, this high-frequency sequence leads to much stronger 
differences between regularization terms and constraints. 

As previously, strong differences are observed between ob- 
servation models. On the average, there is a gain improvement 
from 5 dB (noisy case) up to 10 dB (no noise) between E&Fl 
model and TSO. Such an improvement is due to the high 
contrast in Mire image. Indeed, we know from Sec. III-D 
that our observation model does not induce non homogeneous 
contributions in the case of variable scale motion. The induced 
errors in the reconstructions are more visible in high contrast 
areas, see Fig. 11 compared to Fig. 10. 

We also note that, in the noiseless case, hyperbolic regu- 
larization does not improve performances of E&F methods. 
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Fig. 10. First row, reconstructed SR images. From left to right: E&FO, E&Fl and TSO observation model. All reconstructions are performed 
with a hyperbolic regularization and positivity constraint. Second row: differences between HR reference image and reconstructed SR images. 



whereas we notice a gain up to 1 dB on the average, with the 
TSO model. 

E&F reconstructions are much more noisy than the one 
obtained with the TSO model. Let us recall that these recon- 
struction are obtained with a regularization parameter adjusted 
to get the better PSNR w.r.t. the reference HR image. The 
selected regularization parameter is lower (10~^) with the TSO 
model than with E&F models (10~^). This might indicate 
that the more precise the model is the less it is necessary 
to regularize. In other words, regularization compensates for 
model errors which are lower with the proposed TSO model. 

By using synthetic sequences with rotational and variable 
scale motion, we have shown that the TSO observation model 
leads to better reconstructed SR images than E&F methods, 
whatever the regularization involved. 

As a general comment, it should be emphasized that perfor- 
mances are much more sensitive to a change of observation 
model than to a change of regularization. In other words, a 
good choice of the observation model leads to much higher 
improvement than changing the regularization term, at least in 
the context of rotation and scale variation explored here. 

VI. Experiments on real sequences 

In this section, we compare observation models on real 
sequences. We first discuss prior assumptions on the sequences 
with an emphasis on motion modelization and estimation, then 
we present the results obtained on two real datasets. 

A. General assumptions and motion estimation 

SR requires the knowledge of the sensor response and of the 
motion field between frames. We use the common box function 



model for the PSF. Note that all the tested observation models 
can accomodate a more general PSF. 

We restrict our experiments to affine motion between 
frames, since the proposed TSO model is limited to these 
motion fields. Affine model accurately describes the motion 
of a planar scene through orthographic projection [39]. Such 
assumptions are usually not valid on the whole field of view 
(except in special purpose experiments, see VI-B), nevertheless 
the affine motion model is often a good local approximation 
of complex motion fields [9], valid in a restricted part of 
the image support (see an example in the aerial sequence of 
Sec. VI-C). 

We focus on sequences which exhibit large affine motions, 
with total zoom factor greater than 1.4 and rotations higher 
than 20 degrees (with inter-frame zoom up to 1.2 and rotation 
5 degrees). Note that such experimental settings are not 
considered in the previous papers on SR, even those which 
address the non translational context [9,22]. 

The first problem is to register each image of the sequence 
with respect to the reference image (usually the more resolved 
one). In this context, direct intensity based methods, which 
minimize a DFD (displaced frame difference) criterion are 
subject to false local minima, even using a multiresolution 
approach. This is due to the sensitivity of DFD criterion with 
respect to large rotational and scale changes. Hence, we use a 
two-step approach: 

1) compute a rough affine motion from scale-invariant 
keypoints matching; 

2) refine the affine model using multiresolution DFD min- 
imization on a restricted part of the image. 
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(a) E&FO. 



(b) E&Fl. 



(c) TSO. 



Fig. 11. Top-left parts of SR reconstructed images with hyperbolic regularization and positivity constraint. 11(a): E&FO model, 11(b): 
E&Fl, and 11(c): proposed TSO model. Parameters have been adjusted to get the best PSNR w.r.t. HR reference image. 



The first step uses Scale-Invariant Fast Transform (SIFT) 
keypoints of D. Lowe [40]. We match hundreds of keypoints 
between the considered frame and the reference one by SIFT 
descriptor correlation, then we robustly fit an affine model on 
selected matches using a crude rejection threshold. The second 
step is essentially a domestic version of the pyramidal image 
registration method of Thevenaz et ah [10]. 

B. Lab tests 

We made several SR experiments by using sequences of a 
bitonal resolution chart printed on an A4 paper sheet observed 
with a AVT-046B SVGA Marlin BAV camera. We acquired 
image sequences with variable inter-frame translation, rotation 
and zoom factor: some examples are shown in Fig. 12. 
Each frame of a sequence is registered with respect to the 
reference frame as explained in the previous section. We 
ran SR reconstructions with the three concurrent observation 
models and quadratic or hyperbolic regularization, subject 
to positivity constraint. For each setting, several values of 
the regularization parameter have been tried. Indeed, most 
of the time there is a certain range of (low) values of the 
parameter where differences between methods can easily be 
observed, whereas above some regularization strength, all 
methods become equivalent and yield an oversmoothed result. 

As a first example, we process a purely translational se- 
quence, using 7 frames with a PMF L = 3 and a quadratic 
regularization: comparison on a small (240 x 240) region is 
shown in figure 13, for a low value of A = 7.10~^. As 
expected, in this case E&Fl and TSO lead to quasi-identical 
results (PSNR = 68dB) whatever the parameter A, while E&FO 
shows some instability for low A. 

Fig. 14 and Fig. 15 show compared SR results on 7 frames 
of a sequence with both rotation (up to 25 degrees) and 
zoom (there is a factor 1.5 between the reference image 
and the farthest view). We use either quadratic regularization 
(upper part of the figures) or hyperbolic regularization with a 
threshold parameter s = 10 (lower part). 

For a low value of the regularization parameter (A = 
10 ~^ with quadratic term and A = 3.10"^ with hyperbolic 
regularization), see Fig. 14, E&FO and E&Fl suffer from 
artifacts in the form of a pseudo-periodic texture, which is 






Fig. 12. A sample of frames of the resolution chart, for various 
rotations and zoom factors, left column shows a zoom on the region 
used for further SR comparison. Up: reference frame, which is the 
most resolved one. 




Fig. 13. Reconstruction results with PMF L = 3 using 7 frames with 
global translation motion, in an under-regularized quadratic setting, 
A = 7.10"^. From left to right: E&FO, E&Fl and TSO models. 



of high amplitude in E&FO and less important, but manifest, 
in E&Fl. Not surprisingly, this phenomenon is amplified 
by the hyperbolic regularization. For the same regularization 
parameter, TSO does not encounter such instabilities, but 
exhibits ripples which are typical of an under-regularized 
quadratic solution, and appear amplified by the hyperbolic 
edge-preserving potential. 

For a more balanced value of the regularization parameter, 
see Fig. 15, E&FO is still clearly degraded by instabilities. 
E&Fl and TSO are now very close, but a careful examination 
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Fig. 14. Reconstruction results with PMF L = 3 using 7 frames with 
zoom and rotations, in an under-regularized setting. Up: quadratic 
regularization, A = 10~^. Down: hyperbolic regularization, s = 10, 
A = 3.10"^ From left to right: E&FO, E&Fl and TSO models. 



of both solutions reveals that small amplitude artifacts remain 
in the E&Fl reconstruction. 




Fig. 15. Reconstruction results with PMF L — 3 using 7 frames 
with zoom and rotations, using a balanced regularization strength. Up: 
quadratic regularization, A = 10~^. Down: hyperbolic regularization, 
s = 10, A = 3.10"^ From left to right: E&FO, E&Fl and TSO 
models. 



C Aerial sequence 

Fig. 16 displays the first and the last frames of an infrared 
sequence captured by an array sensor mounted on an airborne 
platform. As the plane gets closer to the scene, the last frame 
is the most resolved one and is chosen as the reference frame. 
The scene is a harbour with the sea and waterfront in the 
foreground, a building with a vertical antenna in the middle 
and a series of cans lined up in the background. Two ships 
are present in the right low part of the last frame. Because of 
perspective effects - the lowest part of the frame is closer to 
the sensor than the upper part - apparent motion is closer to 
an homography than an affinity. From the first frame to the 
reference one, the lower part (resp. upper part) of the field of 
view is magnified with a factor about 1.4 (resp. 1.6). Therefore 
our method can only be applied to small regions of the frames. 

Two regions are considered in the sequel: (i) in the upper 
part of the scene, the lined-up cans that remain unresolved in 
the reference frame (see Fig. 17) and (ii) in the right low part 
of the scene, the waterfront and the ships, see Fig. 20. 




(b) Last frame. 



Fig. 16. IR sequence captured by an airborne sensor, motion resuts 
from variable distance and small rotation. 




Fig. 17. Detail of the last (reference) frame. Lined-up cans zoomed up twice 
using bilinear interpolation. The cans are not resolved. The black vertical line 
in the low middle of the image is the antenna on the building seen in Fig. 16. 

We considered five frames of the sequence. Fig. 16 displays 
two of them. As already described, motion is estimated using 
SIFT on the whole sequence then the intensity based method 
of [10] is used to refine the SIFT estimate in each region. 
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SR reconstruction is performed with the algorithms of 
Sec. V-A, with quadratic regularization (s = oo) and positivity 
constraint. PMF L = 2 along both image axis. 

D. Upper region 

The observation models are compared through the SR 
reconstructions in Fig. 18. 




Fig. 18. Reconstructions obtained through E&FO (top image) , E&Fl (middle 
image) and TSO (bottom image) observation model. A = 5.10~^. 

The image quality in Fig. 18 gradually increases from the 
top image (E&FO) to the bottom image (TSO model). Even if 
the latter is still not a high quality image, the improvement 
in resolution enables the count of the right block of cans in 
the bottom image, whereas it is less obvious in the middle 
image and even impossible in the upper image. The results of 
Fig. 18 look somewhat oversmooth. So a lower regularization 
parameter has been tested, results are displayed in Fig. 19. 




Fig. 19. Reconstructions obtained through E&FO (top image) , E&Fl (middle 
image) and TSO (bottom image) observation model. A = 1.10~^. 

Fig. 19 reveals that E&FO and E&Fl are severely affected 
by the decrease of the regularization parameter, whereas our 
model seems more robust: artifacts appear in the right top part 
of the scene, but cans can still be counted. 



E. Right lower region 

Fig. 21 proposes similar results for the ships at the right 
low part of the scene. The ships appear in bright contrast. A 
bicubic interpolation of the last observed frame is provided in 
Fig. 20. The top image (E&FO model) in Fig. 21 has many 




Fig. 20. Detail of the last frame of Fig. 16. Low right part of the scene: 
waterfront and ships zoomed up twice using bicubic interpolation. 




Fig. 21. Reconstructions have been performed using E&FO (top 
image), E&Fl (middle image) and TSO (bottom image) observation 
model. A 10"^ 

localized high frequency artifacts, part of them are absent 
in the middle image (E&Fl model). These artifacts are not 
present in the bottom image (proposed TSO model). In the 
same time, comparison of SR results and Fig. 20 shows that 
resolution has indeed been increased. 

VII. Conclusion 

The presented paper deals with SR techniques in the field of 
aerial imagery. The proposed work focuses on the observation 
model in the case of an affine motion whereas the main part 



AN IMPROVED OBSERVATION MODEL FOR SUPER-RESOLUTION UNDER AFFINE MOTION. VERSION OF AUGUST 22, 2009 



12 



of SR literature deals with the inversion process or motion 
estimation. 

We analyzed the existing observation models used in SR 
reconstruction and emphasized their underlying assumptions, 
SO as to clarify their limitations. As a result, it is shown that 
these observation models fall into three categories: 

• exact computation 

• convolve-then-warp 

• warp-then-convolve 

Exact computation is not tractable for general motions. The 
convolve-then-warp approach is numerically efficient but is 
unable to capture large rotations and scale variations. So, 
only the third approach, due to Elad and Feuer is relevant 
in our framework. However, we have observed inaccuracies 
for rotations as low as 15° and zooming factor as low as 
20%. We succeeded in extending the E&F model to cover a 
more important range of affine transforms with high accuracy, 
for about 30% more computation time. The pointwise inter- 
polation stage in the E&F method has been replaced by L2 
functional approximation techniques. This technique combines 
a two-shear decomposition for the affine transform and a 1-D 
L2 projection on a shifted bspline basis. 

The proposed model has been compared with various E&F- 
like models. These models have been associated to several 
regularization settings to be tested for SR reconstruction 
purposes using synthetic and real image sequences. 

These tests have stressed the importance of the observation 
model in SR reconstruction when dealing with large zoom 
and rotation effects. In particular the choice of a bilinear 
interpolation instead of a nearest-neighbor one within an Elad 
and Feuer setting dramatically improves the reconstructions. 
Moreover, the proposed model consistently achieves even 
better results. 

Further research should be conducted to accurately deal 
with homographic motion, or piecewise parametric motion. 
It should unlock SR techniques to a larger application field. 
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